Pediatric chest radiograph interpretation in a real-life setting

Chest radiography is a frequently used imaging modality in children. However, only fair to moderate inter-observer agreement has been reported between chest radiograph interpreters. Most studies were not performed in real-world clinical settings. Our aims were to examine the agreement between emergency department pediatricians and board-certified radiologists in a pediatric real-life setting and to identify clinical risk factors for the discrepancies. Included were children aged 3 months to 18 years who underwent chest radiography in the emergency department not during the regular hours of radiologist interpretation. Every case was reviewed by an expert panel. Inter-observer agreement between emergency department pediatricians and board-certified radiologists was assessed by Cohen’s kappa; risk factors for disagreement were analyzed. Among 1373 cases, the level of agreement between emergency department pediatricians and board-certified radiologists was “moderate” (k = 0.505). For radiographs performed after midnight, agreement was only “fair” (k = 0.391). The expert panel identified clinically relevant disagreements in 260 (18.9%) of the radiographs. Over-treatment of antibiotics was identified in 121 (8.9%) of the cases and under-treatment in 79 (5.8%). In a multivariable logistic regression, the following parameters were found to be significantly associated with disagreements: neurological background (p = 0.046), fever (p = 0.001), dyspnea (p = 0.014), and radiographs performed after midnight (p = 0.007). Conclusions: Moderate agreement was found between emergency department pediatricians and board-certified radiologists in interpreting chest radiographs. Neurological background, fever, dyspnea, and radiographs performed after midnight were identified as risk factors for disagreement. Implementing these findings could facilitate the use of radiologist expertise, save time and resources, and potentially improve patient care. What is Known: • Only fair to moderate inter-observer agreement has been reported between chest radiograph interpreters. • Most studies were not performed in real-world clinical settings. Clinical risk factors for disagreements have not been reported. What is New: • In this study, which included 1373 cases at the emergency department, the level of agreement between interpreters was only “moderate.” • The major clinical parameters associated with interpretation discrepancies were neurological background, fever, dyspnea, and interpretations conducted during the night shift. Supplementary Information The online version contains supplementary material available at 10.1007/s00431-024-05717-x.


Introduction
Chest radiography is one of the most frequently used imaging modalities in children, [1,2] and aids in diagnosing several serious pediatric conditions, including pneumonia and its complications, pneumothorax, and congenital heart malformations [3].The most frequent use of chest radiographs (CR) is as a first-line imaging modality for diagnosing community-acquired pneumonia, although clinical guidelines do not routinely recommend this in uncomplicated circumstances [4].While their use has decreased over the past decade, CRs are still frequently performed.A prevalence of 80% was reported in emergency departments (ED) in the United States for children diagnosed with communityacquired pneumonia [2].
Misinterpretation of CR can lead to over-diagnosis and excessive treatment, or to under-diagnosis and potentially harmful consequences [5].Despite its availability and common use, CRs are considered relatively difficult to interpret [6].Some studied have reported only fair to moderate levels of inter-observer agreement, even among board-certified pediatric radiologists [7,8], although their inter-observer agreement is likely better than that of other expertise [8][9][10].Trainees, as pediatric residents and fellows, have shown lower levels of inter-observer agreement than board-certified physicians when each group was studied separately.[9,11] Notably, radiologist interpretation was considered a "gold standard" in some previous studies [5,11,12].
For the reasons described, it is customary in many medical institutions worldwide to provide interpretation by a board-certified radiologist in addition to the interpretation by the ED pediatricians.Interpretations by a radiologist have added value for patient safety and quality assurance [5].Nevertheless, the burden of advanced modern imaging modalities has significantly grown, placing substantial workload on radiologists [13].Identifying the circumstances in which a radiologist's interpretation is most needed could facilitate effective management of medical resources.Furthermore, this approach could guide pediatric providers in determining when they should seek a radiologist's interpretation of imaging results.
A major limitation of relevant published studies is that they were mostly carried out in controlled, non-clinical settings and involved reviewing image sets not within a realworld clinical context [7][8][9][10][11].In these settings, it is difficult to estimate the consequences of misinterpretation and the effect of the clinical settings on interpretation.
We aimed to examine the level of agreement between ED pediatricians and radiologists in interpreting CR of pediatric patients presenting to the ED.In this real-life setting, we hypothesized that the level of agreement would be higher than previously reported, as the patients' clinical conditions could provide useful clues to interpret the CRs even for less experienced physicians.In addition, we aimed to estimate the extent of over/under treatments related to interpretation discrepancies; and to identify risk factors, particularly clinical parameters, for these discrepancies.Identifying these risk factors could potentially facilitate prioritizing CRs for a radiologist's interpretation.

Patients and settings
This cross-sectional study was conducted in a tertiary pediatric hospital.Included were patients aged 3 months to 18 years, who were admitted to the ED during the year 2019.This year was chosen to avoid possible biases related to the particular conditions of the COVID-19 pandemic, starting in Israel in February 2020.Inclusion criteria were the conduct of a CR during the patient's visit, not during the regular hours of radiologist interpretation (as explained below).
• "Radiologist interpretation": In our hospital, during working hours, a board-certified radiologist is present at the hospital and provides an "on-line" interpretation of the CR.This interpretation of imaging is published during the radiologists' working hours and almost never outside these hours, i.e., night shifts and weekends (a full timetable is provided in supplementary Fig. 1).• "ED pediatricians": Regularly, board-certified pediatric ED specialists, board-certified pediatricians, ED fellows, and pediatric residents are present in our ED until 00:00, after which only pediatric residents are present.
To ensure that no radiologist diagnosis influenced the interpretation by the ED pediatricians, we included only CRs that were performed during night shift hours or during the weekend.The time of discharge was also considered.Specifically, patients who were discharged at times when a radiologist's interpretation could have been available were not included, to avoid skewing the results.Excluded were CRs that were performed due to trauma, when a radiologist's interpretation was not available, or when the interpretation by the ED pediatrician was not recorded in the medical chart.Further, CRs were excluded if the review of the medical records raised suspicion that the ED pediatrician may have received a radiologist's interpretation before discharging the patient (i.e., either an explicit statement that the interpretation was by a radiologist; or alternatively, the wording of the chart interpretation was similar to that of radiologists).
The study was approved by the Research Ethics Board of Rabin Medical Center (Approval No. RMC-21-0293).

Study design
In the first step, all the included CRs were retrieved through the computerized system.The system was searched for all the consecutive chest radiographs in the relevant period.Each CR was retrieved together with the patient's medical chart.Every medical chart was reviewed by a pediatric resident (BRG, SH, and YB) who applied the exclusion criteria when relevant.Demographic and clinical data were collected, in addition to the interpretations of the CRs by the ED pediatricians in the exact wording of their documentation in the medical chart.The radiologists' interpretations, as were published later, were collected as well.All the relevant data were entered into an Excel file.To reach the predetermined sample size, every consecutive second chart was selected for inclusion in the study, effectively including half of the total CRs.
In the next step, every case was reviewed by a panel of three board-certified pediatricians [YL, LG, and VSZ].Each member of this "expert panel" had more than 5 years of board-certified experience, and none of them worked in the ED during the study period.Together, they assessed the agreement between the interpretation of the ED pediatricians and the interpretation that was later recorded by the boardcertified radiologist.The panel members were instructed to evaluate the cases according to the WHO assessment method of CR (see below) which were previously presented to them.Only disagreements that were considered to have clinical implications (i.e., any management changes, including the initiation or discontinuation of antibiotics, a pulmonologist consultation, referrals for further imaging, etc.) were considered as discrepancies.The experts categorized each case as either "clinically relevant agreement" or "disagreement."In cases of differing opinions among the panel members, a majority was required.In a limited number of cases, the experts unanimously agreed that the cases could not be attributed to either of the two groups (i.e., "partial agreement").These cases were excluded from the risk factors analysis (secondary outcome, below).The expert panel reviewed the final diagnoses of all the cases as well, and determined for each, the main indication for the CR and the primary diagnosis at discharge.

Definitions and laboratory methods
Study outcomes and gold standards are as follows: (1) The main outcome of the study was the level of agreement between ED pediatricians and radiologists, estimated by Cohen's kappa [i.e., without the need to define a "gold standard"].[14] (2) Secondary outcome: clinical risk factors for disagreements.Those were calculated without predetermining any gold standard as well; (3) Secondary outcome: over-treatment versus undertreatment.For this outcome, the gold standard for chest radiograph interpretation was considered as the interpretation by the radiologists (see "Limitations", below).
We defined CR interpretation according to the WHO assessment method of CR for the diagnosis of pneumonia in children, as 0-"no consolidation/infiltrate/effusion"; 1-"other (non-end-point) infiltrate"; 2-"significant pathology end-point consolidation"; and 3-"pneumonia with pleural effusion" [15].The conclusions of radiographs interpretations were determined according to the WHO definitions as "pneumonia with pleural effusion," "primary end-point pneumonia," "other infiltrate," and "no consolidation/infiltrate/effusion" [15].Additional diagnoses that were coded included suspected foreign body aspiration, pneumothorax, and other (Table 1).

Statistical analysis
The sample size was calculated assuming about 25% bacterial pneumonia among the expected cohort [16], with an expected kappa of 0.5, precision rate of 0.06, and a drop rate of about 20%, yielding n = 1335 [17].Continuous variables were calculated as means and standard deviations, and discrete variables as numbers and percentages.Cohen's kappa was calculated for the whole cohort and among various subpopulations.Kappa results were interpreted as follows: values ≤ 0 as indicating no agreement; 0.01-0.20 as none to slight agreement; 0.21-0.40 as fair agreement; 0.41-0.60 as moderate agreement; 0.61-0.80 as substantial agreement; and 0.81-1.00as almost perfect agreement. 14 Next, we compared, using appropriate statistical analyses, the characteristics of the children, between those who did and did not have discrepancies, between the CR interpretations of the ED pediatricians and of the board-certified radiologists.For these analyses, CRs with partial agreement were omitted.Nominal variables were compared using Pearson's χ 2 test; continuous variables that matched parametric criteria were compared using Student's t-test; and ordinal variables or continuous variables that did not match parametric criteria were compared using the Mann-Whitney U test.Data for some parameters were missing in a minority of patients.No significant differences were observed in missing data between those with and without agreement.Therefore, missing data were omitted from the analysis.A binary logistic regression was performed including age and sex, of parameters reaching p < 0.07 (i.e., trend to significance), or other clinically relevant parameters.A p value of ≤ 0.05 was considered significant.Data were analyzed using Statistical Package for the Social Sciences (SPSS) statistical software, version 24 (SPSS Inc, Chicago, Illinois).

Study cohort
A flowchart of the study cohort is depicted in Fig. 1.A total of 1373 CRs were included in the final cohort.Their characteristics are detailed in Table 1, together with indications for the imaging, primary radiologists' interpretations, and final diagnoses at discharge.

CR interpretations and inter-observer agreement
Cohen's kappa for inter-observer agreement was calculated in various scenarios (Table 2).For the whole cohort, the kappa for agreement between radiologists and ED pediatricians was moderate, 0.505 (95% CI 0.455-0.554).Next, we divided the categories for CR interpretation into two major groups, according to the need for antibiotic treatment.Recalculating Cohen's kappa accordingly, we found moderate agreement, 0.508 (95% CI 0.459-0.557).The kappa was 0.495 [95% CI 0.436-0.553]for the 1068 CRs interpreted by residents, and 0.512 [95% CI 0.405-0.618]for the 305 CRs interpreted by board-certified pediatricians.Kappa was also calculated according to indication, considering only radiographs that were obtained due to respiratory symptoms or fever, to rule out bacterial pneumonia; Among those, kappa was moderate as well, 0.440 [95% CI 0.411-0.470].Kappa for other indications was not calculated due to the low number of cases.Finally, kappa calculated for all the radiographs obtained after midnight (n = 332) was found to be "fair" 0.391 [95% CI 0.282-0.500].
The expert panel concluded that for 1014 CRs (73.9% of the total), there was a "clinically relevant agreement" between the interpretations of the radiologists and the ED pediatricians.For 260 CRs (18.9%) there was "no-agreement," and for 99 (7.2%)CRs, the experts agreed unanimously to categorize as "partial agreement."

Antibiotic prescriptions and other interventions
In practice, antibiotic treatment was administered or recommended for 618 (45.0%) patients (due to numerous indications including bacterial pneumonia, suspected occult bacteremia, acute otitis media, and urinary tract infection).
The expert panel concluded that in 122 CRs (8.9% of the whole cohort; 60.7% of the discrepancies related to antibiotic need), the prescription of the antibiotics was based on misinterpretation of the radiograph and therefore not justified.In contrast, among patients who were not recommended to take antibiotics, 79 (5.8% of the whole cohort; 39.3% of the discrepancies related to antibiotic need) were actually indicated to be treated with antibiotics according to the radiologists interpretations (Fig. 2).Other interventions were required by the radiologists' interpretation of 53 CRs (CT chest for clarifying inconclusive findings, pulmonologist referral for the suspicion of chronic lung disease, repeating CR in the future, etc.).Only for 23 CRs (43.3%) was the intervention conducted as recommended.It is important to note that the other patients and families were contacted by the study researchers and directed regarding required action.

Risk factors for disagreement
To identify risk factors associated with diagnostic disagreement, the cohort was divided into two groups according to the expert panel attribution: "clinically relevant agreement" versus "disagreement."The groups were compared according to parameters that could have been identified prior to the establishment of the final diagnosis (Table 3).Age, sex, and any medical background were similar between the groups.

Discussion
The current study found moderate agreement (k = 0.505) between pediatricians at the ED and board-certified radiologists in interpreting pediatric CR.For 18.9% of the CRs, there was clinically relevant disagreement between interpreters.Antibiotic over-treatment was more common than under-treatment.Major clinical parameters that were associated with interpretation discrepancy were as follows: fever ≥ 38℃, prior neurological condition, dyspnea, and the interpretation occurring during the night shift.
CR is one of the most frequently performed imaging tests, being easy to perform, low in cost and readily available even in low-resource countries.CR provides important information regarding pediatric illness.However, their interpretation can be challenging, and significant variability and discrepancy in findings may lead to unnecessary medication or incorrect management.Fair to moderate levels of agreement, in the range of 0.2-0.68,have been reported [7,8,[18][19][20].Unlike most previous studies, the current study explored real-life clinical cases with "bedside" interpretation.Before conducting the study, we hypothesized that in these settings the level of agreement would be higher, as the patients' clinical conditions could provide useful clues to interpret the CRs even for less experienced physicians.Our results are consistent with those reported by Soudack et al., who found  discordant interpretations in 28% of patients in the ED. 5 The upshot is that CR interpretation in real-life clinical situations should be taken with caution, and interpretation by a radiologist may aid in reaching a proper diagnosis.
A strength of the study is the unique approach, which focused on clinical, and not only radiological parameters, when identifying risk factors for disagreement.Clinical application of the model may help providers identify CRs that should be given high priority for obtaining interpretation by an experienced radiologist before reaching final clinical decisions.Additionally, these results can guide radiologists in setting priorities for interpretations, and potentially save economic resources while optimizing care for patients.[21].
Importantly, there is an inherent problem in identifying a "gold standard" in this field, as reaching an absolute diagnosis may require a complex investigation (such as a chest CT scan and bronchoalveolar lavage) which is not feasible (or recommended) for every patient requiring a chest X-ray.Under these conditions, an interpretation by a board-certified radiologist is considered the best achievable gold standard [15].Indeed, some studies reported good inter-observer agreement among radiologists [22,23].However, clinical decisions result from a multitude of factors, of which imaging is just one.Therefore, the gold standard should be approached with caution, and measures like Cohen's kappa, which examine agreement without bias towards a specific interpreter, should be preferred [14].Accordingly, caution should be exercised regarding data related to the overuse/ underuse of antibiotics, as antibiotic prescribing may reflect more than just a CR result.Nonetheless, this information is important for evaluating the clinical significance of the major findings and was reported in other studies as well.[5,18,24].Notably, most other findings in this study (including the clinical score) are based on agreement between interpreters, without indicating the "correct" interpreter.
We concluded that the most significant parameters for incorrect interpretation were as follows: fever ≥ 38℃, a background neurological condition, dyspnea, and the interpretation occurring during the night shift.We assume that patients with background neurological conditions show higher tendency for scoliosis and chronic lung diseases.These obscure physicians' interpretations of the CRs, with many CRs taken bedside, leading to technical issues and artifacts.[25].Regarding night shifts, this is known to prompt medical errors, whether due to the predominance of trainees during these hours, with the lack of experienced physicians to consult with; or due to physician fatigue.[26][27][28].Interestingly, unlike others [7,8], we did not find a statistically significant difference between the level of agreement within the group of board-certified pediatricians compared to the level of agreement within the group of resident pediatricians.We speculate that this may be due to the very high proportion of trainees in the general cohort (which is apparently related to the pre-selected times that are considered off-hours).
The higher risk of disagreement in the context of fever ≥ 38℃ and dyspnea could be related to the higher probability of lower airway involvement, thus complicating the interpretation. 14Alternatively, familiarity with the patient's clinical symptoms may subconsciously affect the interpretation of CR.Either way, extra caution must be taken when interpreting CR in these contexts.
In many hospitals and imaging centers, conventional films were gradually replaced by Picture Archiving and Communication Systems (PACS).These changed have economic Fig. 3 Variables associated with interpretation discrepancies in pediatric chest radiographs at the emergency department: binary logistic regression results (odds ratios with 95% confidence intervals); CR: chest radiograph benefits [29] but their influence on the accuracy of interpretation is less clear.Nevertheless, some studies reported that radiograph interpretations with digital studies remain as accurate as assessments performed using conventional radiographs, [30,31] or even that sensitivity was improved after introduction of PACS [32].Although the current study was conducted with digital high-resolution images, it is difficult to compare the results to other studies in the field, as not all the authors designated the specific technology they used.Of note, the future introduction of artificial intelligence to EDs may potentially increase the accuracy of CR interpretations.[33].Studies are needed to estimate this improvement.

Limitations
The retrospective design is a limitation of this work, which precludes access to the complete clinical picture of every patient and the establishment of the final role of CR in patient management.To overcome this, an expert panel reviewed every patient's medical record, in an effort to mitigate bias.As discussed above, there is no absolute usable gold-standard for CR; thus, most of the results in this study were based on level of agreement rather than on "incorrect interpretations."In addition, the study was conducted in a tertiary pediatric hospital with a significant rate of patient complexity, together with a high patient volume.This sometimes necessitated rapid decision-making, and the results should be considered in the relevant context.As several parameters (not only CR interpretation) are considered in clinical decisions, the fundamental necessity of providing a radiologist's interpretation for CRs cannot be concluded from this study, but rather only the proportion of CRs with discrepancy in interpretation and their characteristics.Finally, the WHO assessment method of CR mentioned above is not relevant to some of the less prevalent diagnoses (including foreign body in airways and pneumothorax).

Conclusions
In conclusion, we report a moderate level of agreement between pediatric providers and board-certified radiologists in interpreting CR in ED settings.Disputed CRs resulted in more antibiotics overtreatment than undertreatment.Clinical risk factors for disagreement are presented, with the aim of identifying CRs at high risk for disagreement in interpretation.Implementing these results at the pediatric ED can facilitate the utilization of radiologists' expertise, save time and resources, and potentially improve patient care.

Fig. 2
Fig. 2 Clinically relevant agreement levels of chest radiographs according to the expert panel.Note: The pie chart summarizes only discrepancies related to antibiotic indications

Table 2
Summary of Cohen's kappa for inter-observer agreement, according to various scenarios